szeptember 8-16 között
and Their Applications for Speech Recognition and Enhancement
Dr. Divényi Péter
(Pierre L. Divenyi),
Speech and Hearing Research Facility, VA Northern California Health
Care Systems és East Bay Institute for Research and Education, Martinez,
short course will focus on the transformation of speech from acoustic
signal to perceived auditory object. It will first summarize basic properties
of the auditory system and introduce some of the principal models of
auditory spectral, temporal, and spatial analysis. Next, it will examine
the speech signal from a psychoacoustic point of view and identify its
major spectral and temporal characteristics. Implications of masking
noise, i.e., speech subjected to interference by periodic or random
signals, will be also considered. Finally, a brief survey of auditory
attention and its role in disambiguating speech presented in complex
auditory displays will be presented.
of the auditory system
Brief history of hearing research: Pythagoras, Helmholtz,
Békésy, Fletcher, Schouten,
Zwicker, D.M. Green; basic aspects of frequency
(i.e., channel-) analysis based on tonotopic organization throughout the whole auditory system,
and time analysis based on synchrony of neural discharge patterns.
Major psychoacoustic phenomena: pitch, masking and
lateral suppression, AM sensitivity, localization, cross-channel interaction;
principal models that depict the response of the auditory system to
simple and moderately complex sounds.
Generally used physical representations of speech
(spectrogram, cepstrum, oscillogram, correlogram) and representations
of speech as viewed by the brain after peripheral auditory processing
(“neural spectrogram,” etc.).
Vocal sources; voiced and unvoiced segments, plosives
and continuants, formants; vowels and consonants; perception of phonemes,
words, sentences; advantages of auditory representation of speech
and its prediction of the effect of noise on intelligibility.
speech communication situations: speech in noise (random and quasi-periodic),
reverberation, and in crowd noise—the “cocktail-party” effect; data
and models of auditory segregation of multiple sources (speech and
non-speech) and putative auditory and/or central nervous system mechanisms
underlying this process; modes of segregation—primitive data-driven
and conscientious schema-driven (i.e., attention-bound)—and prevalent
models of auditory scene analysis.
and final discussion
As an engineering-oriented summary of the topics covered, an “ideal
auditory system,” capable of flawless speech perception in quiet and
in a “cocktail-party” situation, will be jointly designed.
A kurzus része Dr Vicsi Klára beszédakusztika
PhD tárgyának. A kurzus végén tett
nyelven is) beleszámít
a nevezett tárgy eredményébe.