SWS 2016 語音訊號處理研討會

語音訊號處理研討會是中華民國計算語言學學會,一年一度定期舉辦的學術交流盛會,本次會議所邀請之演講者,包括 Facebook 人工智慧研究團隊的 Tomas Mikolov 博士、德國帕紹大學Björn W. Schuller 教授、臺北大學江振宇教授、中央大學王家慶教授、清華大學李祈均教授、中央研究院資創中心賴穎暉博士。演講內容涵蓋語音訊號處理非常多不同的面向,是所有台灣學術界與產業界對語音訊號處理、自然語言處理以及音樂訊號處理有興趣的專家學者們不容錯過的一場盛會。

除了上述演講外,本次會議同時舉辦國科會研究計畫之成果發表,以促進學術界與產業界技術與經驗的分享交流,並共同討論相關處理技術的新研究與應用方向,因此本次會議預期將可大幅提昇國內數位訊號處理產業技術水準,對工程科技推展效益至鉅。

由於研討會報名非常踴躍,而活動主場地(博理館101演講廳)座位有限,若未事先於網路報名或入場時演講廳已滿座,將安排至鄰近的同步區就座,不便之處還請各位貴賓見諒。

重要時程

02/01 開始報名繳費

02/28 報名截止

03/05 繳費截止

03/18 SWS 2016!

時間地點

2016 三月十八日 星期五
臺灣大學 博理館 R101演講廳
10617 臺北市羅斯福路四段一號

主辦單位

國立臺灣大學 電機工程學系

臺灣大學 電機工程學系

中華民國計算語言學學會

中華民國計算語言學學會

協辦單位

中央研究院資訊科技創新研究中心

中央研究院資訊科技創新研究中心

國立臺北大學 通訊工程學系

臺北大學通訊工程學系

國立清華大學 電機工程學系

清華大學電機工程學系

科技部工程技術研究發展司

科技部工程技術研究發展司工程科技推展中心

Keynote Speakers
Learn from these great fellows

Tomas Mikolov
Facebook 人工智慧研究團隊
Tomas Mikolov
博士, Facebook 人工智慧研究團隊

Tomas Mikolov is a research scientist at Facebook AI Research lab. His most influential work includes development of recurrent neural network language models and discovery of semantic regularities in distributed word representations. These projects have been published as open-source tools RNNLM and word2vec which have been since widely used both in academia and industry. His main research interest is to develop intelligent machines.

江振宇
臺北大學 通訊工程學系
江振宇
助理教授, 臺北大學 通訊工程學系

Chen-Yu Chiang was born in Taipei, Taiwan, in 1980. He received the B.S., M.S., Ph.D. degrees in communication engineering from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 2002, 2004, and 2009, respectively. In 2009, he was a Postdoctoral Fellow at the Department of Electrical Engineering, NCTU, where he primarily worked on prosody modeling for automatic speech recognition and text-to-speech system, under the guidance of Prof. Sin-Horng Chen. In 2012, he was a Visiting Scholar at the Center for Signal and Image Processing (CSIP), Georgia Institute of Technology, Atlanta. Currently he is the director of the Speech and Multimedia Signal Processing Lab and an assistant professor at the Department of Communication Engineering, National Taipei University. His main research interests are in speech processing, in particular prosody modeling, automatic speech recognition and text-to-speech systems

王家慶
中央大學 電腦科學與資訊工程學系
王家慶
副教授, 中央大學 電腦科學與資訊工程學系

Jia-Ching Wang received the M.S. and Ph.D. degrees in electrical engineering from National Cheng Kung University, Tainan, Taiwan, in 1997 and 2002, respectively. He was an Honorary Fellow with the Department of Electrical and Computer Engineering, University of Wisconsin-Madison in 2008 and 2009. Currently, he is an Associate Professor with the Department of Computer Science and Information Engineering, National Central University, Jhongli City, Taiwan. His research interests include signal processing and VLSI architecture design. Dr. Wang is an honorary member of Phi Tau Phi Scholastic Honor Society and a member of the Association for Computing Machinery and IEICE.

Björn W. Schuller
德國帕紹大學
Björn W. Schuller
教授, 德國帕紹大學

Björn W. Schuller is Full Professor and Chair of Complex and Intelligent Systems at the University of Passau/Germany, Reader (Associate Professor) in Machine Learning at Imperial College London/UK, and the co-founding CEO of audEERING. Further affiliations include HIT/China as Visiting Professor and the University of Geneva/Switzerland and Joanneum Research in Graz/Austria as an Associate. Previously, he was with the CNRS-LIMSI in Orsay/France and headed the Machine Intelligence and Signal Processing Group at TUM in Munich/Germany. There, he received his diploma in 1999, doctoral degree in 2006, his habilitation in 2012, and was entitled Adjunct Teaching Professor – all in electrical engineering and information technology. Best known are his works advancing Intelligent Audio Analysis and Affective Computing. Dr Schuller is President Emeritus of the AAAC, elected member of the IEEE SLTC, and Senior Member of the IEEE. He (co-)authored 5 books and >500 peer reviewed technical contributions (>10,000 citations, h-index = 49). Selected activities include his role as Editor in Chief of the IEEE Transactions on Affective Computing, Associate Editor of Computer Speech and Language, IEEE Signal Processing Letters, IEEE Transactions on Cybernetics, and IEEE Transactions on Neural Networks and Learning Systems. Professor Schuller was General Chair of ACM ICMI 2014, Program Chair of ACM ICMI 2013, IEEE SocialCom 2012, and ACII 2015 and 2011, as well as organiser of the INTERSPEECH 2009-2016 annual Computational Paralinguistics Challenges and the 2011-2016 annual Audio/Visual Emotion Challenges. He won several awards including best results in research challenges such as CHiME, MediaEval, or of ACM Multimedia. In 2015 and 2016 he has been honoured as one of 40 extraordinary scientists under the age of 40 by the World Economic Forum.

李祈均
清華大學 電機工程學系
李祈均
助理教授, 清華大學 電機工程學系

Chi-Chun Lee (Jeremy) is an Assistant Professor at the Electrical Engineering Department of the National Tsing Hua University (NTHU), Taiwan. He received his B.S. degree with honor, magna cum laude, in electrical engineering from the University of Southern California (USC) in 2007, and his Ph.D. degree in electrical engineering from the USC in 2012. He was a data scientist at id:a lab at ID Analytics in 2013. He was awarded with the USC Annenberg Fellowship. He led a team to win the Emotion Challenge in Interspeech 2009. He is a coauthor on a best paper in Interspeech 2010. He is a member of Tau Beta Pi, Phi Kappa Phi, and Eta Kappa Nu honor societies.

His research interests are in interdisciplinary human-centered behavioral signal processing, emphasizing the development of computational frameworks in recognizing and quantifying human behavioral attributes and interpersonal interaction dynamics using machine learning and signal processing techniques.

賴穎暉
中央研究院資創中心
賴穎暉
博士, 中央研究院資創中心

Ying-Hui Lai received the B.S. degrees in department of industrial education from National Taiwan Normal University in 2005, and the Ph.D. degree in department of biomedical engineering from National Yang-Ming University in 2013. From January 2010 to June 2012, Dr. Lai was as a research and development (R&D) engineering in Aescu technology, Taipei, Taiwan, where he engaged in research and product development in hearing aids. Currently, Dr. Lai is a postdoctoral fellow of the research center for information technology innovation at Academia Sinica. His research focuses on hearing aids, cochlear implant, speech enhancement and pattern recognition.

會議議程

TimeTopicSpeaker
08:30 - 09:00報到-
09:00 - 09:10開幕致詞李宏毅 助理教授
09:10 - 10:10Recurrent Networks and Beyond Tomas Mikolov 博士
10:10 - 10:30Coffee Break-
10:30 - 11:30Prosody Modeling and its Applications to Spoken Language Processing 江振宇 助理教授
11:30 - 12:30Robust Sound Event Recognition 王家慶 副教授
12:30 - 14:00Lunch-
14:00 - 15:00Say no more – the computer already deeply knows you? Björn W. Schuller 教授
15:00 - 16:00A window into you: BSP effort for quantifying human behaviors across domains of health, education, and psychology 李祈均 助理教授
16:00 - 16:20Coffee Break-
16:20 - 17:20Improvement of the speech intelligibility for cochlear implantees by the adaptive compression strategy and deep learning based noise reduction approaches. 賴穎暉 博士
17:20 - 17:30閉幕-

開幕致詞

李宏毅 助理教授

Recurrent Networks and Beyond

Tomas Mikolov 博士

In this talk, I will give a brief overview of recurrent networks and their applications. I will then present several extensions that aim to help these powerful models to learn more patterns from training data. This will include a simple modification of the architecture that allows to capture longer context information, and an architecture that allows to learn complex algorithmic patterns. The talk will be concluded with a discussion of a long term research plan on how to advance machine learning techniques towards development of artificial intelligence.

HOST:李宏毅 助理教授

Coffee Break

Prosody Modeling and its Applications to Spoken Language Processing

江振宇 助理教授

The term prosody refers to certain inherent suprasegmental properties that carry melodic, timing, and pragmatic information of continuous speech, encompassing accentuation, intonation, rhythm, speaking rate, prominences, pauses, and attitudes or emotions intended to express. Prosodic features are physically encoded in the variations in pitch contour, energy level, duration, and silence of spoken utterances. Prosodic studies have indicated that these prosodic features are not produced arbitrarily, but rather realized after a hierarchically organized structure which demarcates speech flows into domains of varying lengths by boundary or break cues. It is also known that the hierarchical prosodic structures are highly correlated with information sources of the linguistic features (lexical, syntactic, semantic, and pragmatic), the para-linguistic features (intentional, attitudinal, and stylistic), and the non-linguistic features (physical and emotional). Therefore, we can regard prosodic information as an interface between messages generated by humans and realized speech acoustic features. We may also regard prosody as a communication protocol between speakers. This talk will introduce some advances of prosody modeling jointly developed by the Speech and Multimedia Signal Processing Lab, NTPU, and the Speech Processing Lab, NCTU. The applications of the prosody modeling to automatic speech recognition (ASR) and text-to-speech system (TTS) will be also addressed.

HOST:吳宗憲 教授

Robust Sound Event Recognition

王家慶 副教授

Using sound event recognition in home environments has become a new research issue in home automation or smart homes. Identifying sound classes can significantly help home environmental monitoring. Predefined home automation services can be triggered by associated sound classes. However, variant noises or interferences always make an impact on the recognition performance. These problems are unsolved and researches to tackle them are greatly needed. In this talk, we will present several robust sound event recognition techniques such as front-end processes to filter out noises or interferences, and an approach to extract robust audio features.

HOST:王新民 研究員

Say no more – the computer already deeply knows you?

Björn W. Schuller 教授

Recent advances in deep and weakly supervised learning helped to lend computers new socio-affective skills. Focusing on human speech analysis, this talk highlights current abilities and potential in automatic characterisation of speakers in rich ways. This includes acquisition of information on speakers’ sincerity, deception, native language and degree of nativeness, cognitive and physical load, emotion and personality, or health diagnostics – just to name a few. An according modern architecture for holistic speech analysis will be shown including cooperative on-line learning by efficient crowd-sourcing. Further, an approach for end-to-end learning will be featured aiming at seamless speech modelling. Then, a low-resource implementation based on the openSMILE toolkit co-developed by the presenter is demonstrated considering real-time on device processing on smart phones and alike. Examples of application use-cases stem from a number of ongoing European projects – these will show-case the potential, but also current shortcomings. In an outlook, future avenues are laid out to best overcome these.

HOST:楊奕軒 副研究員

A window into you: BSP effort for quantifying human behaviors across domains of health, education, and psychology

李祈均 助理教授

The abstraction of humans with a signals and systems framework naturally brings a synergy between engineering and behavioral sciences. Behavioral signal processing (BSP) offers a new frontier of interdisciplinary research between these communities. The core research in BSP is to model human behaviors, internal states, and perceptual judgements with observational data by using computational methods grounded in signal processing and machine learning. The outcome of BSP offers novel informatics for enhancing the capabilities of domain experts in facilitating better decision making

In this talk, we will demonstrate the use of BSP techniques in various application domains: affective computing, mental health, and educational research. The heterogeneity in human behavior expression, the subjectivity in human perceptual judgement, and the complex non-linear interplay of multiple influencing factors require not only an advancement in algorithmic development but also a closer collaboration with domain experts. With this emerging effort of BSP, we strive not only to provide engineering solutions to domain experts but also to open up potential opportunities of novel insights in the applications with broad societal impact.

HOST:簡仁宗 教授

Coffee Break

Improvement of the speech intelligibility for cochlear implantees by the adaptive compression strategy and deep learning based noise reduction approaches.

賴穎暉 博士

Cochlear implants (CIs) are surgically implanted electronic devices that provide a sense of sound in patients with profound to severe hearing loss. The considerable progress of CI technologies in the past three decades has enabled many CI users to enjoy a high level of speech understanding in quiet. For most CI users, however, understanding speech in noisy environments remains a challenge. In this talk, I will present two approaches to address this important issue to further improve the benefits of speech intelligibility for CI recipients under noisy conditions. First, I will describe the proposed adaptive envelope compression (AEC) strategy, which is effective at enhancing the modulation depth of envelope waveform by making best use of its dynamic range and thus improving the intelligibility for CI recipients compared with the traditional static envelope compression. Second, I will introduce the deep learning based NR approach (deep denoising autoencoder, DDAE), which has been investigated its effectiveness for improving the speech intelligibility for CI recipients. Experimental results indicated that, under challenging noisy listening conditions, the AEC strategy and DDAE NR yields higher intelligibility scores than conventional approaches for Mandarin-speaking listeners, suggesting that AEC and DDAE NR could potentially be integrated into a CI processor to overcome speech perception degradation caused by noise.

HOST:曹昱 助研究員

交通資訊

台灣大學 - 10617 臺北市羅斯福路四段一號

為求響應環保,並落實節能減碳政策,建議您盡量搭乘大眾運輸工具(捷運)前來本校。

  • (松山新店線) - 臺灣大學位於捷運公館站(松山新店線)旁,捷運公館站的3號出口距離本校校門口(羅斯福路與新生南路交口)僅需3分鐘左右的步行時間,2號出口更是直接連接本校的舟山路出口(羅斯福路與舟山路交口),非常快捷方便。
  • (文湖線) - 若您欲到達臺大校總區東北側,則可考慮搭乘捷運文湖線,在科技大樓站下車,之後沿著復興南路往南走,便可抵達本校辛亥門。
  • Guideline
地圖

聯絡我們

GENERAL CO-CHAIRS
李宏毅 助理教授
hungyilee@ntu.edu.tw

10617 臺北市羅斯福路四段一號

李琳山 教授
lslee@gate.sinica.edu.tw

10617 臺北市羅斯福路四段一號

WORKSHOP EMAIL
大會信箱
sws2016.ntuspeech@gmail.com

有任何問題,請寄到此信箱。

REGISTRATION
黃琪 小姐
aclclp@hp.iis.sinica.edu.tw

中華民國計算語言學學會

02-2788-3799 #1502