![]() |
|   |
Scientific and technical results |
Speech prosody.Speech prosody plays an important role in automatic speech understanding and recognition. A number of studies justifies the usefulness of automatic detection of prosodic events in speech recognition. In the Laboratory of Speech Acoustics, based on the fixed stress of Hungarian, a special heuristic intonation and stress classification scheme was developed, which automatically identifies and classifies phonological phrases in speech. The identified phonological phrase boundaries boundaries are proven to often coincide with word boundaries. Phonological phrases are known to constrain lexical access and hence play an important role in human speech perception, which can be exploited in automatic speech understanding too. The system is tested on Hungarian and Finnish. Based on this work, a prosodic segmenter (phonological phrase aligner) tools was also developed, which is easily adaptable for fixed-stress languages. Instead of phonological phrases, intonation phrases can also be aligned, allowing for sentence level segmentation of speech and automatic sentence mood recognition. Current research is focusing on the prosody/syntax interface and prosodic modelling in spontaneous speech for automatic speech recognition/understanding tasks.Emotion recognition.Statistical examinations of basic emotions are prepared using databases of read and also continuous, spontaneous speech materials. Databases were created gathering spontaneous speech from different conversations through phone line and from different TV programmes. Different acoustic features were extracted (pitch, intensity, spectral features) and combined into a classifier (Support Vector Machines). Classification of four basic emotions (anger, joy, neutral (comfort) and sadness) is made resulting approx. 79% of recognition on the basis of intonational phrase-sized units. An automatic speech detection and segmentation system is under development, which can segment continuous speech into intonational phrase-sized units, needed for the automatic emotion recognition.Tóth Sz L, Sztahó D, Vicsi K. Speech Emotion Perception by Human and Machine. In: Proceeding of COST Action 2102 International Conference: Revised Papers in Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction. Patras, Greece, 2007.10.29-2007.10.31. Springer, pp. 213-224. (ISBN: 978-3-540-70871-1)Klára Vicsi, Dávid Sztahó. Problems of the Automatic Emotion Recognitions in Spontaneous Speech; An Example for the Recognition in a Dispatcher Center. In: Anna Esposito at al (szerk.) Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues: Third COST 2102 International Training School, Caserta, Italy, March 15-19, 2010, Revised Selected Papers. Olaszország, 2005.03.15-2005.03.19. London: Springer, pp. 331-339. ISBN: 978-3-642-18183-2. WoS link, DOI: 10.1007/978-3-642-18184-9_28Dávid Sztahó, Viktor Imre, Klára Vicsi. Automatic Classification of Emotions in Spontaneous Speech. In: Anna Esposito, Alessandro Vinciarelli, Klára Vicsi, Cathrine Pelachaud, Anton Nijolt (szerk.) Analysis of Verbal and Nonverbal Communication and Enactment: The Processing Issues. Budapest, Hungary, 2010.09. Berlin ; Heidelberg: Springer-Verlag, pp. 229-239. (ISBN: 978-3-642-25774-2) Multilingual speech-controlled telematics framework for personalised automotive services.NKFP072-TELEAUTO (2008-2010) The aim of the PACS (people assisted computer system) concept is to make it possible for the users to ask questions in a natural language, and get answers based on their geographical location. The process handles data and voice connection in a unified manner. Conversation begins with the user's request and ends with the downloading of the desired data. The questions are interpreted by a speech recognition framework. Should it fail, or be unable to serve the user's request, a human operator takes over its role.Partners: Child-Computer Interaction Techniques: Exploring the Prosody.TéT - Balaton project (2008-2009) One of the primary aims of the project is to improve computer assisted speech learning methods for handicapped children (hard of hearing and autistic children) by the introduction of prosody models. These models will be firstly used for the development of the prosody of the speakers. Prosody plays an important role in communication. Prosodic cues help the human to understand the messages through a conversation, thus the correct pronunciation of the different prosodic cues are very important. Moreover, we will use prosody for the determination of the emotional state of the speaker which can be take into account in the interface. Partners:
Cross-Modal Analysis of Verbal and Non-verbal CommunicationCOST Action 2102 (2006-2011) The main objective of the Action is to develop an advanced acoustical, perceptual and psychological analysis of verbal and non-verbal communication signals originating in spontaneous face-to-face interaction, in order to identify algorithms and automatic procedures capable of identifying the human emotional states. Several key aspects will be considered, such as the integration of the developed algorithms and procedures for application in telecommunication, and for the recognition of emotional states, gestures, speech and facial expressions, in anticipation of the implementation of intelligent avatars and interactive dialogue systems that could be exploited to improve user access to future telecommunication services. Partners: countries from Europe Continuous speech recognition:-
A development tool (MKBF 1.0) for constructing continuous speech recognizers has been created under Windows XP. The system is based on a statistical approach (HMM phoneme models, and bi-gram language models with non linear smoothing) and works in real time. The tool is able to construct a middle sized speech recognizer with a vocabulary of 1000-20000 words. New solutions have been developed for the acoustical pre-processing, for the statistical model building of phonemes, and in syntactic level [ ](IKTA, OTKA). (2004- ) ![]() A prosodic recogniser has been developed.A cross-lingual study for agglutinative, fixed stressed languages, like Hungarian and Finnish was prepared by this recognizer, about the segmentation of continuous speech on word level by examination of supra-segmental parameters. We have developed different algorithms based either on a rule based or a data-driven approach. The best results were obtained by data-driven algorithms (HMM-based methods) using the time series of fundamental frequency and energy together. Word boundaries were marked with acceptable accuracy, even if we were unable to find all of them. On the base of this study a word level segmentationer has been developed which can indicate the word boundaries with acceptable precision for both languages IKTA-00056/2003 projectFor details in Hungarian, click here! Statistical examination of languages:
- Search of optimal unit of CSR system. -Examination of the Pronunciation Variation from hand-labelled corpora Data driven method has been used for the examination of the pronunciation variation rules.Systematic statistical analysis was prepared and the obtained results demonstrate how the pronunciation variation depends on the position and the connection of the sound. Pronunciation matrixes were constructed, presenting the probability of the occurences. Both word-internal and cross-word rules were examined separately and compared. The examination was prepared based on the Hungarian speech databases, but the method is adaptable for other languages.[B3], [B4] (2003-,[](ITKA, OMFB) -Rule based and statistical language model for Hungarian speech recognition (ITKA, OMFB)[],[](2004- Database collection:- Hungarian Reference Speech Database - Read speech database collection in office like environments for training and testing continuous speech recognition programs running in PC-s [ ](IKTA). (2004-2005) - SpeechDat(E) -
European project (1999- ) Speech database collection through telephone
lines. A realistic base noth for the training and testing of the present-day
teleservices and the training of real speaker independent recognizers. [D6][E6[E7]] -MTBA, BESZTEL-Speech database collection through telephone lines and mobil telephones.[C3],[C16] (2000-2003) Speech perception:- The most relevant acoustical parameters for the perception of
different speech-sounds were determined; Speech analysis:- A simplified auditory model (Bark scale, masking, loudness
involved) were developed, and speech sounds were analyzed by this model [7],
[8]. (1983-1985) Speech processing for speech handicapped: - The "Speech Corrector" an audio-visual speech-pronunciation
teaching system has been developed for speech handicapped people. The
database of Corrector System for Hungarian and German language is finished. [16],
[17], [20]. The system uses up the knowledge of speech recognition
results. Demo
of the program can be reached. - Interactive hearing and speech perception theraphy through the internet (item 39) The proposed hearing and speech perception training program will help the education of hearing impaired children by didactically well-constructed and playful exercises using specially constructed databases. Poor hearing abilities are stimulated by different environmental sounds and special speech examples. In this way we develop the ability of the perception and distinction of different sounds and moreover we train the speech understanding and combination skills of children. The program will help the rehabilitation of children with cochlear implant, too. The program will be available through the Internet for everybody free of charge.[[D1][E2](2003) Speech enhancement:- Some noise reduction methods were developed for speech recognizers [12],[E18][E19](1988-1990)Isolated word recognition:- Speaker dependent small vocabulary system (max. 80 words) stand alone [6],[E16];- Speaker dependent medium vocabulary system (some hundred words) for IBM PC-s [9]. - Speaker independent isolated speech recognizer. Recognize numbers, short instructions throught telephone line, and in different sound fields (1989 -) Dialogues systems- Adaptation of well known systems to the Hungarian language and to the Hungarian habits (1998 -)Annotation and segmentation:- Automatic speech segmentation on phonetic, sub-phonetic level
of continuous speech, automatic labelling (1997- ) |