Vendég professzori kurzus

2003 szeptember 8-16 között


Auditory Models and Their Applications for Speech Recognition and Enhancement

Előadó: Dr. Divényi Péter (Pierre L. Divenyi),
Speech and Hearing Research Facility, VA Northern California Health Care Systems és East Bay Institute for Research and Education, Martinez, California, USA.

This short course will focus on the transformation of speech from acoustic signal to perceived auditory object. It will first summarize basic properties of the auditory system and introduce some of the principal models of auditory spectral, temporal, and spatial analysis. Next, it will examine the speech signal from a psychoacoustic point of view and identify its major spectral and temporal characteristics. Implications of masking noise, i.e., speech subjected to interference by periodic or random signals, will be also considered. Finally, a brief survey of auditory attention and its role in disambiguating speech presented in complex auditory displays will be presented.


Characterization of the auditory system

2.1    Brief history of hearing research: Pythagoras, Helmholtz, Békésy, Fletcher, Schouten, Zwicker, D.M. Green; basic aspects of frequency (i.e., channel-) analysis based on tonotopic organization throughout the whole auditory system, and time analysis based on synchrony of neural discharge patterns.

2.2.    Major psychoacoustic phenomena: pitch, masking and lateral suppression, AM sensitivity, localization, cross-channel interaction; principal models that depict the response of the auditory system to simple and moderately complex sounds.

Speech perception

2.1.    Generally used physical representations of speech (spectrogram, cepstrum, oscillogram, correlogram) and representations of speech as viewed by the brain after peripheral auditory processing (“neural spectrogram,” etc.).

2.2.    Vocal sources; voiced and unvoiced segments, plosives and continuants, formants; vowels and consonants; perception of phonemes, words, sentences; advantages of auditory representation of speech and its prediction of the effect of noise on intelligibility.

Speech perception in interference

Natural speech communication situations: speech in noise (random and quasi-periodic), reverberation, and in crowd noise—the “cocktail-party” effect; data and models of auditory segregation of multiple sources (speech and non-speech) and putative auditory and/or central nervous system mechanisms underlying this process; modes of segregation­—primitive data-driven and conscientious schema-driven (i.e., attention-bound)—and prevalent models of auditory scene analysis.

Recapitulation and final discussion

As an engineering-oriented summary of the topics covered, an “ideal auditory system,” capable of flawless speech perception in quiet and in a “cocktail-party” situation, will be jointly designed.



8. Sept


Wednesday 10. Sept


12. Sept


16. Sept



Lecture 1.1

Lecture 1.2

hangminta 1.
hangminta 2.


Lecture 2.1

Lecture 2.2

Auditory Toolkit (10MB)


Lecture 3.1

Lecture 3.2


final exam




9. Sept


11. Sept


15. Sept


10 - 12

St. 202








A kurzus része Dr Vicsi Klára beszédakusztika PhD tárgyának. A kurzus végén tett fakultatív vizsga (lehet magyar nyelven is) beleszámít a nevezett tárgy eredményébe.