Vendég professzori kurzus

2003 szeptember 8-16 között

BME TMIT

Auditory Models and Their Applications for Speech Recognition and Enhancement

Előadó: Dr. Divényi Péter (Pierre L. Divenyi),
Speech and Hearing Research Facility, VA Northern California Health Care Systems és East Bay Institute for Research and Education, Martinez, California, USA.

This short course will focus on the transformation of speech from acoustic signal to perceived auditory object. It will first summarize basic properties of the auditory system and introduce some of the principal models of auditory spectral, temporal, and spatial analysis. Next, it will examine the speech signal from a psychoacoustic point of view and identify its major spectral and temporal characteristics. Implications of masking noise, i.e., speech subjected to interference by periodic or random signals, will be also considered. Finally, a brief survey of auditory attention and its role in disambiguating speech presented in complex auditory displays will be presented.

Synopsis

Characterization of the auditory system

2.1    Brief history of hearing research: Pythagoras, Helmholtz, Békésy, Fletcher, Schouten, Zwicker, D.M. Green; basic aspects of frequency (i.e., channel-) analysis based on tonotopic organization throughout the whole auditory system, and time analysis based on synchrony of neural discharge patterns.

2.2.    Major psychoacoustic phenomena: pitch, masking and lateral suppression, AM sensitivity, localization, cross-channel interaction; principal models that depict the response of the auditory system to simple and moderately complex sounds.

Speech perception

2.1.    Generally used physical representations of speech (spectrogram, cepstrum, oscillogram, correlogram) and representations of speech as viewed by the brain after peripheral auditory processing (“neural spectrogram,” etc.).

2.2.    Vocal sources; voiced and unvoiced segments, plosives and continuants, formants; vowels and consonants; perception of phonemes, words, sentences; advantages of auditory representation of speech and its prediction of the effect of noise on intelligibility.

Speech perception in interference

Natural speech communication situations: speech in noise (random and quasi-periodic), reverberation, and in crowd noise—the “cocktail-party” effect; data and models of auditory segregation of multiple sources (speech and non-speech) and putative auditory and/or central nervous system mechanisms underlying this process; modes of segregation­—primitive data-driven and conscientious schema-driven (i.e., attention-bound)—and prevalent models of auditory scene analysis.

Recapitulation and final discussion

As an engineering-oriented summary of the topics covered, an “ideal auditory system,” capable of flawless speech perception in quiet and in a “cocktail-party” situation, will be jointly designed.

Schedule

 

Monday
8. Sept

 

Wednesday 10. Sept

 

Friday
12. Sept

 

Tuesday
16. Sept

9-13:30

I.B.210

Lecture 1.1

Lecture 1.2

hangminta 1.
hangminta 2.

 

Lecture 2.1

Lecture 2.2

Auditory Toolkit (10MB)

 

Lecture 3.1

Lecture 3.2

 

final exam

 

 

   

Tuesday
9. Sept

 

Thursday
11. Sept

 

Monday
15. Sept

 

10 - 12

St. 202

 

consultation

 

consultation

 

consultation

 

A kurzus része Dr Vicsi Klára beszédakusztika PhD tárgyának. A kurzus végén tett fakultatív vizsga (lehet magyar nyelven is) beleszámít a nevezett tárgy eredményébe.